{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_4_dropout.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 5: Regularization and Dropout**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 5 Material\n",
    "\n",
    "* Part 5.1: Part 5.1: Introduction to Regularization: Ridge and Lasso [[Video]](https://www.youtube.com/watch?v=jfgRtCYjoBs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_1_reg_ridge_lasso.ipynb)\n",
    "* Part 5.2: Using K-Fold Cross Validation with Keras [[Video]](https://www.youtube.com/watch?v=maiQf8ray_s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_2_kfold.ipynb)\n",
    "* Part 5.3: Using L1 and L2 Regularization with Keras to Decrease Overfitting [[Video]](https://www.youtube.com/watch?v=JEWzWv1fBFQ&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_3_keras_l1_l2.ipynb)\n",
    "* **Part 5.4: Drop Out for Keras to Decrease Overfitting** [[Video]](https://www.youtube.com/watch?v=bRyOi0L6Rs8&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_4_dropout.ipynb)\n",
    "* Part 5.5: Benchmarking Keras Deep Learning Regularization Techniques [[Video]](https://www.youtube.com/watch?v=1NLBwPumUAs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_5_bootstrap.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 5.4: Drop Out for Keras to Decrease Overfitting\n",
    "\n",
    "Hinton, Srivastava, Krizhevsky, Sutskever, & Salakhutdinov (2012) introduced the dropout regularization algorithm. [[Cite:srivastava2014dropout]](http://www.jmlr.org/papers/volume15/nandan14a/nandan14a.pdf) Although dropout works differently than L1 and L2, it accomplishes the same goal—the prevention of overfitting. However, the algorithm does the task by actually removing neurons and connections—at least temporarily. Unlike L1 and L2, no weight penalty is added. Dropout does not directly seek to train small weights.\n",
    "Dropout works by causing hidden neurons of the neural network to be unavailable during part of the training. Dropping part of the neural network causes the remaining portion to be trained to still achieve a good score even without the dropped neurons. This technique decreases co-adaptation between neurons, which results in less overfitting. \n",
    "\n",
    "Most neural network frameworks implement dropout as a separate layer. Dropout layers function like a regular, densely connected neural network layer. The only difference is that the dropout layers will periodically drop some of their neurons during training. You can use dropout layers on regular feedforward neural networks. \n",
    "\n",
    "The program implements a dropout layer as a dense layer that can eliminate some of its neurons. Contrary to popular belief about the dropout layer, the program does not permanently remove these discarded neurons. A dropout layer does not lose any of its neurons during the training process, and it will still have the same number of neurons after training. In this way, the program only temporarily masks the neurons rather than dropping them. \n",
    "Figure 5.DROPOUT shows how a dropout layer might be situated with other layers.\n",
    "\n",
    "**Figure 5.DROPOUT: Dropout Regularization**\n",
    "![Dropout Regularization](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_9_dropout.png \"Dropout Regularization\")\n",
    "\n",
    "The discarded neurons and their connections are shown as dashed lines. The input layer has two input neurons as well as a bias neuron. The second layer is a dense layer with three neurons and a bias neuron. The third layer is a dropout layer with six regular neurons even though the program has dropped 50% of them. While the program drops these neurons, it neither calculates nor trains them. However, the final neural network will use all of these neurons for the output. As previously mentioned, the program only temporarily discards the neurons. \n",
    "\n",
    "The program chooses different sets of neurons from the dropout layer during subsequent training iterations. Although we chose a probability of 50% for dropout, the computer will not necessarily drop three neurons. It is as if we flipped a coin for each of the dropout candidate neurons to choose if that neuron was dropped out. You must know that the program should never drop the bias neuron. Only the regular neurons on a dropout layer are candidates.\n",
    "The implementation of the training algorithm influences the process of discarding neurons. The dropout set frequently changes once per training iteration or batch. The program can also provide intervals where all neurons are present. Some neural network frameworks give additional hyper-parameters to allow you to specify exactly the rate of this interval. \n",
    "\n",
    "Why dropout is capable of decreasing overfitting is a common question. The answer is that dropout can reduce the chance of codependency developing between two neurons. Two neurons that develop codependency will not be able to operate effectively when one is dropped out. As a result, the neural network can no longer rely on the presence of every neuron, and it trains accordingly. This characteristic decreases its ability to memorize the information presented, thereby forcing generalization.\n",
    "\n",
    "Dropout also decreases overfitting by forcing a bootstrapping process upon the neural network. Bootstrapping is a prevalent ensemble technique. Ensembling is a technique of machine learning that combines multiple models to produce a better result than those achieved by individual models. The ensemble is a term that originates from the musical ensembles in which the final music product that the audience hears is the combination of many instruments.  \n",
    "\n",
    "Bootstrapping is one of the most simple ensemble techniques. The bootstrapping programmer simply trains several neural networks to perform precisely the same task. However, each neural network will perform differently because of some training techniques and the random numbers used in the neural network weight initialization. The difference in weights causes the performance variance. The output from this ensemble of neural networks becomes the average output of the members taken together. This process decreases overfitting through the consensus of differently trained neural networks.  \n",
    "\n",
    "Dropout works somewhat like bootstrapping. You might think of each neural network that results from a different set of neurons being dropped out as an individual member in an ensemble. As training progresses, the program creates more neural networks in this way. However, dropout does not require the same amount of processing as bootstrapping. The new neural networks created are temporary; they exist only for a training iteration. The final result is also a single neural network rather than an ensemble of neural networks to be averaged together.\n",
    "\n",
    "The following animation shows how dropout works: [animation link](https://yusugomori.com/projects/deep-learning/dropout-relu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from scipy.stats import zscore\n",
    "\n",
    "# Read the data set\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv\",\n",
    "    na_values=['NA','?'])\n",
    "\n",
    "# Generate dummies for job\n",
    "df = pd.concat([df,pd.get_dummies(df['job'],prefix=\"job\")],axis=1)\n",
    "df.drop('job', axis=1, inplace=True)\n",
    "\n",
    "# Generate dummies for area\n",
    "df = pd.concat([df,pd.get_dummies(df['area'],prefix=\"area\")],axis=1)\n",
    "df.drop('area', axis=1, inplace=True)\n",
    "\n",
    "# Missing values for income\n",
    "med = df['income'].median()\n",
    "df['income'] = df['income'].fillna(med)\n",
    "\n",
    "# Standardize ranges\n",
    "df['income'] = zscore(df['income'])\n",
    "df['aspect'] = zscore(df['aspect'])\n",
    "df['save_rate'] = zscore(df['save_rate'])\n",
    "df['age'] = zscore(df['age'])\n",
    "df['subscriptions'] = zscore(df['subscriptions'])\n",
    "\n",
    "# Convert to numpy - Classification\n",
    "x_columns = df.columns.drop('product').drop('id')\n",
    "x = df[x_columns].values\n",
    "dummies = pd.get_dummies(df['product']) # Classification\n",
    "products = dummies.columns\n",
    "y = dummies.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will see how to apply dropout to classification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fold #1\n",
      "Fold score (accuracy): 0.68\n",
      "Fold #2\n",
      "Fold score (accuracy): 0.695\n",
      "Fold #3\n",
      "Fold score (accuracy): 0.7425\n",
      "Fold #4\n",
      "Fold score (accuracy): 0.71\n",
      "Fold #5\n",
      "Fold score (accuracy): 0.6625\n",
      "Final score (accuracy): 0.698\n"
     ]
    }
   ],
   "source": [
    "########################################\n",
    "# Keras with dropout for Classification\n",
    "########################################\n",
    "\n",
    "import pandas as pd\n",
    "import os\n",
    "import numpy as np\n",
    "from sklearn import metrics\n",
    "from sklearn.model_selection import KFold\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation, Dropout\n",
    "from tensorflow.keras import regularizers\n",
    "\n",
    "# Cross-validate\n",
    "kf = KFold(5, shuffle=True, random_state=42)\n",
    "    \n",
    "oos_y = []\n",
    "oos_pred = []\n",
    "fold = 0\n",
    "\n",
    "for train, test in kf.split(x):\n",
    "    fold+=1\n",
    "    print(f\"Fold #{fold}\")\n",
    "        \n",
    "    x_train = x[train]\n",
    "    y_train = y[train]\n",
    "    x_test = x[test]\n",
    "    y_test = y[test]\n",
    "    \n",
    "    #kernel_regularizer=regularizers.l2(0.01),\n",
    "    \n",
    "    model = Sequential()\n",
    "    model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1\n",
    "    model.add(Dropout(0.5))\n",
    "    model.add(Dense(25, activation='relu', \\\n",
    "                activity_regularizer=regularizers.l1(1e-4))) # Hidden 2\n",
    "    # Usually do not add dropout after final hidden layer\n",
    "    #model.add(Dropout(0.5)) \n",
    "    model.add(Dense(y.shape[1],activation='softmax')) # Output\n",
    "    model.compile(loss='categorical_crossentropy', optimizer='adam')\n",
    "\n",
    "    model.fit(x_train,y_train,validation_data=(x_test,y_test),\\\n",
    "              verbose=0,epochs=500)\n",
    "    \n",
    "    pred = model.predict(x_test)\n",
    "    \n",
    "    oos_y.append(y_test)\n",
    "    # raw probabilities to chosen class (highest probability)\n",
    "    pred = np.argmax(pred,axis=1) \n",
    "    oos_pred.append(pred)        \n",
    "\n",
    "    # Measure this fold's accuracy\n",
    "    y_compare = np.argmax(y_test,axis=1) # For accuracy calculation\n",
    "    score = metrics.accuracy_score(y_compare, pred)\n",
    "    print(f\"Fold score (accuracy): {score}\")\n",
    "\n",
    "\n",
    "# Build the oos prediction list and calculate the error.\n",
    "oos_y = np.concatenate(oos_y)\n",
    "oos_pred = np.concatenate(oos_pred)\n",
    "oos_y_compare = np.argmax(oos_y,axis=1) # For accuracy calculation\n",
    "\n",
    "score = metrics.accuracy_score(oos_y_compare, oos_pred)\n",
    "print(f\"Final score (accuracy): {score}\")    \n",
    "    \n",
    "# Write the cross-validated prediction\n",
    "oos_y = pd.DataFrame(oos_y)\n",
    "oos_pred = pd.DataFrame(oos_pred)\n",
    "oosDF = pd.concat( [df, oos_y, oos_pred],axis=1 )\n",
    "#oosDF.to_csv(filename_write,index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
